[pull] master from DataDog:master#544
Merged
Merged
Conversation
* Set device.os.type to Linux on all OCSF events Add a schema-category-mapper that sets ocsf.device.os.type and ocsf.device.os.type_id (200/Linux) on every event the integration emits. This gives downstream rules a stable, source-agnostic way to filter for Linux events — e.g. cross-source detection rules can use @ocsf.device.os.type:Linux to scope to Linux endpoints without depending on metadata.product.vendor_name (which encodes the source, not the OS). Also add profiles: [host] to the SOCKADDR Network Activity sub-pipeline so device.os.* validates against the OCSF schema there (Network Activity has no native device attribute). Test fixtures updated to expect device.os in all 29 ocsf: blocks, and the SOCKADDR fixture to expect metadata.profiles: [host]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix OCSF validation: device.os.name, base_event host profile, device.type_id Address validator errors after adding device.os.type to all sub-pipelines: - ocsf.device.os.name is required by the OCSF OS object; set it to "Linux" via a top-level string-builder in the OCSF pre-transformations sub-pipeline so it applies to every event. - ocsf.device.name is required to satisfy the Device object's at_least_one constraint on sub-pipelines that don't otherwise populate hostname/ip; set to "Unknown" via the same top-level string-builder. - Base Event class natively includes the host profile in OCSF; declaring profiles: [host] on the Base Event schema-processor brings the pipeline in line so device.* validates. - ocsf.device.type_id was missing from Base Event, SOCKADDR Network Activity, and SYSCALL Network Activity sub-pipelines; added a schema-category-mapper (Unknown / 0) to each. Test fixtures updated to reflect the new fields: device.os.name on all 29 fixtures, device.name on the 11 fixtures that previously lacked it (9 IAM/Device-Config with hostname/ip, 2 Network 4001), device.type and device.type_id on the 11 fixtures that previously lacked the full device shape (9 Base Event, 2 Network 4001), and metadata.profiles: [host] on the 9 Base Event fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…3728) * Avoid cleanup when cancel called while check running * Add changelog * check if cancelled before running jobs * Add some debug lines for cancel flow
* [OCSF] Zeek/Corelight pipeline
Add OCSF v1.5.0 normalization for Zeek/Corelight logs, covering 7 log
types across 5 OCSF classes (Detection Finding, Network Activity, HTTP
Activity, DNS Activity, File Hosting Activity).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix validate-logs errors in zeek.yaml
Resolve 36 validation errors flagged by the datadog-assets validator:
- Add missing `overrideOnConflict: false` to 3 attribute-remappers
- Fix 2 schema-remapper names to backtick individual fields
- Rename 25 facets to match validator's canonical names and add
`type: integer`/`facetType: range` where required
- Remove 6 facets with unresolvable path conflicts (validator demanded
unique paths with no canonical definition available)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix severity mapping for Detection Finding [2004] Notice
Notice events emit `severity.name` capitalized ("High", "Medium", etc.),
so the lowercase `@severity.name:informational` filters never matched
and the fallback assigned `ocsf.severity_id: 99` while preserving the
capitalized name as `ocsf.severity`. Switch the schema-category-mapper
to filter on the numeric `severity.id` (1-5) which Corelight reliably
emits, and update the notice fixture's expected `severity_id` from 99
to 4 to reflect the corrected mapping.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add catch-all category to schema-category-mappers with fallback
Each schema-category-mapper that defines a fallback must also have a
catch-all filter category at the end matching the fallback's values.
Six mappers were missing the trailing catch-all: notice/alert
severity_id (2004), http activity_id/status_id (4002), dns rcode_id,
and dns status_id (4003). Append `query: "*"` -> Other/99 to each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Apply PR review feedback for Zeek/Corelight OCSF pipeline
Direct mappings, dead-code removal, correctness fixes, and OCSF validator
cleanups across notice, suricata, conn, ssl, weird, http, dns, and file
hosting sub-pipelines:
- Map directly to OCSF targets where intermediates were unnecessary
(ocsf.time, ocsf.duration, ocsf.traffic.packets, JA3/JA3S algorithm_id,
weird protocol_name).
- Drop dead/auto-generated mappers: notice/suricata category_uid (set by
schema-processor), self-maps of finding_info.uid, event_code, file.hashes
(when unbuilt upstream), suricata community_id correlation_uid, HTTP
version-as-protocol_ver, DNS direction derivation, and the DNS rcode_id
catch-all/fallback (recommended-not-required).
- Convert suricata alert.signature_id event_code from string-builder to
schema-remapper.
- Combine domain/query into single ocsf.query.hostname schema-remapper.
- Fix DNS Activity filters: use rcode_name presence to discriminate
Response/Query instead of dns.answer.name (handles NXDOMAIN responses).
- DNS status_id catch-all renamed Other/99 -> Unknown/0 to satisfy the
OCSF validator's suspicious-Other check.
- File Hosting tx_hosts/rx_hosts: drop the second intermediate field;
grok targets ocsf.{src,dst}_endpoint.ip directly off a single stringify.
- Switch fallback source fields per Jonah's suggestions:
severity -> severity.name, alert.severity -> alert_severity,
http status -> status_msg, dns rcode/status -> rcode_name.
- Notice fixture: use id.orig_h/id.resp_h connection fields instead of
the suricata-style src.
Regenerated zeek_tests.yaml with the OCSF validator (--check-all --write).
All 14 logs pass validation with no errors or warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Map Zeek DNS answers to ocsf.answers as dns_answer objects
Use two array-processors to wrap each Zeek `answers` string into a
dns_answer object and append to ocsf.answers: the first selects the
first array element into ocsf.answer.rdata, the second appends
ocsf.answer onto ocsf.answers. Only the first answer is captured (the
pipeline DSL has no per-element iteration), but that covers the common
single-A-record case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add catch-all for activity_id
* Fix validate-logs failure for DNS answers wrapper
The previous array-processor type:select required operation.filter and
operation.valueToExtract per the asset validator, but those only apply
to object arrays - Zeek's `answers` is a primitive string array. Switch
to string-builder + grok-parser to extract the first answer string into
ocsf.answer.rdata, then keep the array-processor append to wrap it into
ocsf.answers as a dns_answer object.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address codex review feedback for file pipeline
- Include `files_red` in the File Hosting [6006] sub-pipeline filter so
redacted file events get OCSF class_uid/activity_id/file fields, not
just the pre-transform metadata.
- Prefer `filename` over `fuid` when populating `ocsf.file.name`; fall
back to `fuid` only when `filename` is absent. The `fuid` mapping to
`ocsf.file.uid` is unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Drop pipeline intermediates, fix multi-IP grok, restore file.hashes
- is_alert (notice 2004, suricata 2004): string-builder writes directly
to `ocsf.is_alert`; grok-parser converts in place. Drops the
`_is_alert_str` intermediate.
- DNS answers: stringify directly into `ocsf.answer`; grok extracts
`ocsf.answer.rdata` via `a %{data:ocsf.answer.rdata}(,%{data})?` so
the comma-separated multi-IP form parses correctly. Drops the
`_answers_str` intermediate.
- File Hosting tx/rx hosts: stringify directly into
`ocsf.{src,dst}_endpoint`; grok extracts `.ip` via
`g %{ip:ocsf.{src,dst}_endpoint.ip}(,%{data})?` for multi-IP. Drops
the `_tx_hosts_str`/`_rx_hosts_str` intermediates.
- Connection 4001: arithmetic-processor writes total bytes directly to
`ocsf.traffic.bytes`; the schema-processor remapper becomes a
self-map. Drops the `_total_bytes` intermediate (matches the
earlier _total_packets/_duration_ms cleanup).
- Restore `ocsf.file.hashes`: build `tmp_md5`/`tmp_sha1`/`tmp_sha256`
fingerprint objects (algorithm name, integer algorithm_id, value),
array-processor append each into `ocsf.file.hashes`, and self-map
the array inside the 6006 schema-processor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Add OCSF DNS Activity normalization to coredns pipeline Map CoreDNS query/response logs to OCSF DNS Activity [4003]. Adds OCSF facets, a single-class sub-pipeline (no pre-transformation), and the generated expected OCSF blocks in the test fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Align coredns OCSF facet names with cloudflare and route53 validate-logs flagged five OCSF facet path conflicts. Rename to the canonical form used by the existing DNS integrations and add the `type: integer` annotation expected on `ocsf.rcode_id` and `ocsf.src_endpoint.port`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add facetType range to ocsf.src_endpoint.port facet validate-logs asks for `facetType: range` on this facet path. Match the form CI's canonical-suggestion message printed for ocsf.src_endpoint.port. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * remove redundant fallbacks --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )